380 research outputs found
Large-scale interactive exploratory visual search
Large scale visual search has been one of the challenging issues in the era of big data. It demands techniques that are not only highly effective and efficient but also allow users conveniently express their information needs and refine their intents. In this thesis, we focus on developing an exploratory framework for large scale visual search. We also develop a number of enabling techniques in this thesis, including compact visual content representation for scalable search, near duplicate video shot detection, and action based event detection. We propose a novel scheme for extremely low bit rate visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. Compact representation of video data is achieved through identifying keyframes of a video which can also help users comprehend visual content efficiently. We propose a novel Bag-of-Importance model for static video summarization. Near duplicate detection is one of the key issues for large scale visual search, since there exist a large number nearly identical images and videos. We propose an improved near-duplicate video shot detection approach for more effective shot representation. Event detection has been one of the solutions for bridging the semantic gap in visual search. We particular focus on human action centred event detection. We propose an enhanced sparse coding scheme to model human actions. Our proposed approach is able to significantly reduce computational cost while achieving recognition accuracy highly comparable to the state-of-the-art methods. At last, we propose an integrated solution for addressing the prime challenges raised from large-scale interactive visual search. The proposed system is also one of the first attempts for exploratory visual search. It provides users more robust results to satisfy their exploring experiences
Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos
This work proposes a self-supervised learning system for segmenting rigid
objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D
videos of static objects, which can be captured with a camera carried by a
mobile robot. A key feature of the self-supervised training process is a
graph-matching algorithm that operates on the over-segmentation output of the
point cloud that is reconstructed from each video. The graph matching, along
with point cloud registration, is able to find reoccurring object patterns
across videos and combine them into 3D object pseudo labels, even under
occlusions or different viewing angles. Projected 2D object masks from 3D
pseudo labels are used to train a pixel-wise feature extractor through
contrastive learning. During online inference, a clustering method uses the
learned features to cluster foreground pixels into object segments. Experiments
highlight the method's effectiveness on both real and synthetic video datasets,
which include cluttered scenes of tabletop objects. The proposed method
outperforms existing unsupervised methods for object segmentation by a large
margin
ARMBench: An Object-centric Benchmark Dataset for Robotic Manipulation
This paper introduces Amazon Robotic Manipulation Benchmark (ARMBench), a
large-scale, object-centric benchmark dataset for robotic manipulation in the
context of a warehouse. Automation of operations in modern warehouses requires
a robotic manipulator to deal with a wide variety of objects, unstructured
storage, and dynamically changing inventory. Such settings pose challenges in
perceiving the identity, physical characteristics, and state of objects during
manipulation. Existing datasets for robotic manipulation consider a limited set
of objects or utilize 3D models to generate synthetic scenes with limitation in
capturing the variety of object properties, clutter, and interactions. We
present a large-scale dataset collected in an Amazon warehouse using a robotic
manipulator performing object singulation from containers with heterogeneous
contents. ARMBench contains images, videos, and metadata that corresponds to
235K+ pick-and-place activities on 190K+ unique objects. The data is captured
at different stages of manipulation, i.e., pre-pick, during transfer, and after
placement. Benchmark tasks are proposed by virtue of high-quality annotations
and baseline performance evaluation are presented on three visual perception
challenges, namely 1) object segmentation in clutter, 2) object identification,
and 3) defect detection. ARMBench can be accessed at http://armbench.comComment: To appear at the IEEE Conference on Robotics and Automation (ICRA),
202
Enhanced Interfacial Dzyaloshinskii-Moriya Interaction in annealed Pt/Co/MgO structures
The interfacial Dzyaloshinskii-Moriya interaction (iDMI) is attracting great
interests for spintronics. An iDMI constant larger than 3 mJ/m^2 is expected to
minimize the size of skyrmions and to optimize the DW dynamics. In this study,
we experimentally demonstrate an enhanced iDMI in Pt/Co/X/MgO ultra-thin film
structures with perpendicular magnetization. The iDMI constants were measured
using a field-driven creep regime domain expansion method. The enhancement of
iDMI with an atomically thin insertion of Ta and Mg is comprehensively
understood with the help of ab-initio calculations. Thermal annealing has been
used to crystallize the MgO thin layer for improving tunneling
magneto-resistance (TMR), but interestingly it also provides a further increase
of the iDMI constant. An increase of the iDMI constant up to 3.3 mJ/m^2 is
shown, which could be promising for the scaling down of skyrmion electronics
Hierarchical Multi-scale Attention Networks for action recognition
Recurrent Neural Networks (RNNs) have been widely used in natural language
processing and computer vision. Among them, the Hierarchical Multi-scale RNN
(HM-RNN), a kind of multi-scale hierarchical RNN proposed recently, can learn
the hierarchical temporal structure from data automatically. In this paper, we
extend the work to solve the computer vision task of action recognition.
However, in sequence-to-sequence models like RNN, it is normally very hard to
discover the relationships between inputs and outputs given static inputs. As a
solution, attention mechanism could be applied to extract the relevant
information from input thus facilitating the modeling of input-output
relationships. Based on these considerations, we propose a novel attention
network, namely Hierarchical Multi-scale Attention Network (HM-AN), by
combining the HM-RNN and the attention mechanism and apply it to action
recognition. A newly proposed gradient estimation method for stochastic
neurons, namely Gumbel-softmax, is exploited to implement the temporal boundary
detectors and the stochastic hard attention mechanism. To amealiate the
negative effect of sensitive temperature of the Gumbel-softmax, an adaptive
temperature training method is applied to better the system performance. The
experimental results demonstrate the improved effect of HM-AN over LSTM with
attention on the vision task. Through visualization of what have been learnt by
the networks, it can be observed that both the attention regions of images and
the hierarchical temporal structure can be captured by HM-AN
Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine
Tensegrity robots, composed of rigid rods and flexible cables, exhibit high
strength-to-weight ratios and extreme deformations, enabling them to navigate
unstructured terrain and even survive harsh impacts. However, they are hard to
control due to their high dimensionality, complex dynamics, and coupled
architecture. Physics-based simulation is one avenue for developing locomotion
policies that can then be transferred to real robots, but modeling tensegrity
robots is a complex task, so simulations experience a substantial sim2real gap.
To address this issue, this paper describes a Real2Sim2Real strategy for
tensegrity robots. This strategy is based on a differential physics engine that
can be trained given limited data from a real robot (i.e. offline measurements
and one random trajectory) and achieve a high enough accuracy to discover
transferable locomotion policies. Beyond the overall pipeline, key
contributions of this work include computing non-zero gradients at contact
points, a loss function, and a trajectory segmentation technique that avoid
conflicts in gradient evaluation during training. The proposed pipeline is
demonstrated and evaluated on a real 3-bar tensegrity robot.Comment: Submitted to ICRA202
- …